Methods to Reduce I/O for Decision Tree Classifiers

نویسندگان

  • Vineet Singh
  • Anurag Srivastava
چکیده

Classification is an important data mining problem. Although datasets can be quite large in data mining applications, it can be advantageous to use the entire training dataset as opposed to sampling since that can increase accuracy. I/O is a significant component of overall execution time in many decision tree classifiers. We present some new optimizations that work with many of these classifiers on both sequential and parallel processors. For ease of explanation, we describe these optimizations mostly in the context of SPRINT, a classifier developed recently for large problems where the training datasets may be disk resident.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MMDT: Multi-Objective Memetic Rule Learning from Decision Tree

In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...

متن کامل

A novel hybrid method for vocal fold pathology diagnosis based on russian language

In this paper, first, an initial feature vector for vocal fold pathology diagnosis is proposed. Then, for optimizing the initial feature vector, a genetic algorithm is proposed. Some experiments are carried out for evaluating and comparing the classification accuracies which are obtained by the use of the different classifiers (ensemble of decision tree, discriminant analysis and K-nearest neig...

متن کامل

Cost Complexity Pruning of Ensemble Classifiers

In this paper we study methods that combine multiple classification models learned over separate data sets in a distributed database setting. Numerous studies posit that such approaches provide the means to efficiently scale learning to large datasets, while also boosting the accuracy of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system...

متن کامل

Comparing different stopping criteria for fuzzy decision tree induction through IDFID3

Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...

متن کامل

Performance Evaluation of Decision Tree Classifiers on Medical Datasets

In data mining, classification is one o f the significant techniques with applications in fraud detection, Artificial intelligence, Medical Diagnosis and many other fields. Classification of objects based on their features into predefined categories is a widely studied problem. Decision trees are very much useful to diagnose a patient problem by the physicians. Decision tree classifiers are use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008